Goto

Collaborating Authors

 cogl tx


CogL TX: Applying BERT to Long Texts

Neural Information Processing Systems

BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels.



CogL TX: Applying BERT to Long Texts

Neural Information Processing Systems

BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels.